A Bi-clustering Framework for Categorical Data

نویسندگان

  • Ruggero G. Pensa
  • Céline Robardet
  • Jean-François Boulicaut
چکیده

Bi-clustering is a promising conceptual clustering approach. Within categorical data, it provides a collection of (possibly overlapping) bi-clusters, i.e., linked clusters for both objects and attribute-value pairs. We propose a generic framework for bi-clustering which enables to compute a bi-partition from collections of local patterns which capture locally strong associations between objects and properties. To validate this framework, we have studied in details the instance CDK-Means. It is a K-Means-like clustering on collections of formal concepts, i.e., connected closed sets on both dimensions. It enables to build bi-partitions with a user control on overlapping between bi-clusters. We provide an experimental validation on many benchmark datasets and discuss the interestingness of the computed bi-partitions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها

Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...

متن کامل

Bi-level clustering of mixed categorical and numerical biomedical data

Biomedical data sets often have mixed categorical and numerical types, where the former represent semantic information on the objects and the latter represent experimental results. We present the BILCOM algorithm for 'Bi-Level Clustering of Mixed categorical and numerical data types'. BILCOM performs a pseudo-Bayesian process, where the prior is categorical clustering. BILCOM partitions biomedi...

متن کامل

A Framework for Clustering Massive Text and Categorical Data Streams

Many applications such as news group filtering, text crawling, and document organization require real time clustering and segmentation of text data records. The categorical data stream clustering problem also has a number of applications to the problems of customer segmentation and real time trend analysis. We will present an online approach for clustering massive text and categorical data stre...

متن کامل

خوشه‌بندی خودکار داده‌های مختلط با استفاده از الگوریتم ژنتیک

In the real world clustering problems, it is often encountered to perform cluster analysis on data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. In addition, traditional methods, for example, the K-means algorithm, usually ask the user to provide the number of clusters. In this...

متن کامل

LIMBO: Scalable Clustering of Categorical Data

Clustering is a problem of great practical importance in numerous applications. The problem of clustering becomes more challenging when the data is categorical, that is, when there is no inherent distance measure between data values. We introduce LIMBO, a scalable hierarchical categorical clustering algorithm that builds on the Information Bottleneck (IB) framework for quantifying the relevant ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005